Closed Bug 859157 Opened 9 years ago Closed 9 years ago

TBPL + TBPLbot have wrong times for builds

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: philor)

References

Details

Looks like the timezone isn't being handled correctly - TBPL thinks the build happened 7 hours earlier than it did.

Example from bug 858752 comment #0:
https://tbpl.mozilla.org/php/getParsedLog.php?id=21452006&tree=Mozilla-Central
says this at the top of the page
b2g_mozilla-central_win32_gecko build on 2013-04-04 10:58:20 PDT for push 55f9e3e3dae7

The build json (http://builddata.pub.build.mozilla.org/buildjson/builds-2013-04-05.js.gz, search for 21452006) gives an epoch time:
      "starttime": 1365123500

And 
 $ 'TZ='US/Pacific' python
 Python 2.7.2 (default, Oct 11 2012, 20:14:37) 
 [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import time
 >>> time.strftime("%c", time.localtime(1365123500))
 'Thu Apr  4 17:58:20 2013'

Which is backed up by the step times in the log and on the buildbot master.
So, I actually suspect this is due to something Sheeri told me a few days ago...

The generic (db) cluster accidentally had their time changed to UTC from PT, and there was (at the time) no known fallout from it. She said it should be easy to revert if needed/desired but until someone came forward with a regression that spoke of that change it was unlikely to be worth reverting.

My memory says that TBPL uses said generic cluster (our buildbot code is a seperate DB server that did not change TZ). So it is my first theory here.
Indeed, that's very likely it.

What are our options?
From what I see we have 3 choices:

1) Change the code so it becomes aware of time zones and converts if necessary
2) update buildbot to UTC (I prefer this)
3) change generic back to Pacific (not preferred, lots of other dbs on generic and it's a tough sell)
(In reply to Sheeri Cabral [:sheeri] from comment #3)
> 1) Change the code so it becomes aware of time zones and converts if
> necessary

It kind of is already, and it has some code to automatically adjust to DST. But it is still a pita to work with -_-

> 2) update buildbot to UTC (I prefer this)

UTC all around :-) \o/
Could you point out where buildbot isn't in UTC ? AFAICT json that tbpl is importing is using epoch timestamps (http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz)
[scabral@buildbot1.db.scl3 ~]$ date
Wed Apr 10 13:17:09 PDT 2013
nthomas, to be more clear, the problem came about when we changed generic's OS timezone to UTC, from Pacific. so when generic and buildbot db's timezones were the same, no issue. Now that they're different....issue.
Not sure how the timezone of the buildbot db host comes into it. All our buildbot masters are in Pacific time and that hasn't changed, and buildbot stores time data independent of the master's timezone (using seconds since epoch). If TBPL is getting it wrong when the generic db's timezone changed then that implies comment #4 isn't complete.
so it sounds like either way, changing buildbot to UTC should be fine - either it resolves the problem, meaning comment 4 isn't complete, or it doesn't, but then there's no harm in changing it to UTC.
I think some columns in the status database are mysql DateTime columns
tbpl has zero connection with the buildbot db. None, no access, no way it could. The only thing it does, the only thing it can do, is read http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz, which is created from the buildbot db. The only times in there are epoch timestamps, and they are correct. A job which started at 20:59 PDT says it started at 1365652740, which is 20:59 PDT. If changing the buildbot db to UTC "fixed" this tbpl problem, the single way in which it's possible would be by turning builds-4hr.js.gz into a lie, claiming that things are epoch timestamps when in fact they are epoch timestamps plus 25,200 seconds, and without looking at how that lie was propogated, it would be impossible to say whether that lie would survive the change back to PST next fall.
The point of converting to an America/Los_Angeles datetime in https://hg.mozilla.org/webtools/tbpl/file/1a851b11739b/dataimport/import-buildbot-data.py#l102 was because we thought we were going to insert it into a db running in that timezone, right?
dev has been UTC since mid-December, how come tests didn't pick up the change then?

Here's what happened - the old generic servers were built with the old standard, Pacific time. The new generic server, built in March (just a few weeks ago), was built with the new standard, which is UTC. We didn't catch this problem when we were building the new server, which we did because high load was causing the old servers to become unresponsive.

We only caught the problem after the new generic database server was in place, because it was only at that time when we could do a data integrity check, which does put stress on the system. The only differences were the timing, and we have a project starting in q3 to try to get as many db servers as possible to UTC, so we decided we'd keep this unintentional change in place.

The real problem is described in comment 0.

Are new build times accurate? or still 7 hours off?

Is it a matter of changing the dates from *before* the UTC change to be precise? or *after*? or both?
Tests? We don't have any of those.

Newly added data continues to be wrong; data from before the change continues to be right (https://tbpl.mozilla.org/?rev=b4bfc1c0829c from before versus https://tbpl.mozilla.org/?rev=2949e808ed33 from now, click the green B in the Fedora opt row in the right column, the "started nn:nn" in the bottom left of the screen should be a few minutes after the time listed above the push in the left column, as it still is for the before case, but the now case is 7 hours before the push).

The fact that we still pull out the right time for data we put in before the db change implies that we don't need to worry about what we're doing to timestamps when we read them, only when we write them. That makes me suspect that all we need to do is drop the America/Los_Angeles bit in https://hg.mozilla.org/webtools/tbpl/file/1a851b11739b/dataimport/import-buildbot-data.py#l102, stick our UTC time into our UTC db, and we'll be fine (except for having a few weeks of bad data, but because hardly anyone ever looks back on tbpl, that's probably not even worth fixing).

An even better move would probably be to explicitly set the connection timezone to UTC in https://hg.mozilla.org/webtools/tbpl/file/1a851b11739b/dataimport/import-buildbot-data.py#l330 (however you actually do that, I fell asleep reading the MySQL manual last night) so that a tbpl dev's local copy will work correctly no matter what his system timezone, and thus his inherited db timezone, is.
I'm not sure if your first sentence is tongue-in-cheek or not, but can you test dropping the America/LA bit (or perhaps changing it to UTC)?

As per https://dev.mysql.com/doc/refman/5.5/en/time-zone-support.html:

-------------
"Per-connection time zones. Each client that connects has its own time zone setting, given by the session time_zone variable. Initially, the session variable takes its value from the global time_zone variable, but the client can change its own time zone with this statement:

mysql> SET time_zone = timezone;"

-------------

The global time_zone variable is:
mysql> SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE VARIABLE_NAME='time_zone';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| TIME_ZONE     | SYSTEM         |
+---------------+----------------+
1 row in set (0.00 sec)

Which is expected, it's whatever the system is (which is UTC right now).


This can easily be tested, by logging in from a machine and running:
SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME='time_zone';
That will give you the session value for time_zone.
No, serious.

tbpl has no automated tests, no QA people, no manual acceptance checklist, no actual dev- install (tbpl-dev is actually staging, for historical reasons), and no developers in either the sense of MoCo webdev employees or the sense of people who work on it because that's the thing they do, just a few users who patch it when they have to, none of whom, to the best of my knowledge, have any access to anything which would allow "logging in from a machine" other than in their local install.

I know virtually no Python, and have never used pytz, or MySQLdb, or done any MySQL through any other API from Python, but yeah, it does look like I'll have to spend the weekend trying to replace my broken local install of tbpl with a working one, and trying to force it into UTC if it doesn't install that way, and seeing if I can teach myself enough to patch tbpl up into an at least temporarily working state.
Hi Philor,

What an unfortunate situation to be in. I haven't used MySQLdb a lot, but I have done it some, and I'm happy to help if I can. 

nthomas, how urgent is this? does philor really need to work over the weekend for this, or can it wait?
Pushed https://hg.mozilla.org/webtools/tbpl/rev/29cad65aeb7f to see how it goes on tbpl-dev, since it seems to fix the problem in the parts of my local install that are actually working.
Depends on: 861578
(In reply to Phil Ringnalda (:philor) from comment #18)
> Pushed https://hg.mozilla.org/webtools/tbpl/rev/29cad65aeb7f to see how it
> goes on tbpl-dev, since it seems to fix the problem in the parts of my local
> install that are actually working.

lgtm, r+

Sorry for the delay, had a conference Fri/Sat and have been preparing to move house around that. Thank you for landing this :-)
Assignee: nobody → philringnalda
Status: NEW → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
Version: other → Trunk
In production and looking fine :-)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Thanx philor!
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.