Closed Bug 537003 Opened 10 years ago Closed 8 years ago
Socorro - fix all timestamps to be timezone aware
with the work to get a Active Daily User report, we've encountered our first timezone related problem. Socorro currently uses the type "timestamp without time zone" as its default datetime type. It saves PST or PDT times in those data columns. What would it take to convert to using UTC intead? Would old data have to be converted? Would column types have to be changed? What code changes would be necessary for literals? How does this affect the reports table parititioning?
Postgresql distinguishes timestamp: resolution (approximately) 1 microsecond with time zone: The zone is used to calculate internal (UTC) representation without time zone: The data is treated as though UTC date: resolution 1 day time zone is allowed, and if present may adjust representation near midnight time: resolution 1 microsecond with time zone: Deprecated because the date (not known) is important to decide whether daylight time is in effect without time zone: Stores microseconds since midnight So, timestamp is what we need to use. Note that while UTC is the obvious choice, any other consistent time zone is feasible. If we chose to make that consistently 'what the clock says in San Francisco, CA', then we would not need to make any change to existing data... unless the ADU data has some other representation, in which case the 1 day granularity precludes an easy fixup to some other time zone.
I think we should get this for 1.7.8 (need a milestone!)
Target Milestone: 2.0 → 1.7.7
I'm afraid that Frank's conclusions are incorrect. What you should be using throughout the database is specifically TIMESTAMP WITH TIME ZONE. Attempts to use a "consistent time zone" such as UTC are vulnerable to both application programming bugs and to system clock misconfiguration. If we store all timestamps as TIMESTAMPTZ, then data can be displayed relative to the user's timezone, or in PST/PDT which has been our standard for reports for some time. Further, with TIMESTAMPTZ it is not necessary for us to attempt to correct time zones at the application end, as long as they are being inserted via a PostgreSQL client which is aware of the client system's local time zone. Unfortunately, this cannot easily be corrected without degraded service during the correction period.
OK, so I've changed the title of the bug to reflect what we really need to do. Specifically, here's an example from the reports table: client_crash_date | timestamp with time zone | date_processed | timestamp without time zone | build_date | timestamp without time zone | started_datetime | timestamp without time zone | completed_datetime | timestamp without time zone | This has a number of issues with it: (1) we're put in a position of comparing a timestamp without at time zone to a timestamp with a time zone, always a risky proposition. (2) We're depending on the implicit understanding that all timestamps are being stored in PSTPDT. Unfortunately, on examination almost every table in the database currently has at least one timestamp-without-time-zone column. So modifying all the data types would be quite time consuming, since PostgreSQL 9.0 insists on rewriting the table when we make this kind of a change (9.1 may not, but it won't be out for a while). Note that this could be done without taking socorro down, at least I think so. It would just affect performance, and require a fair bit of labor while it was going on.
Summary: Socorro - standardize all datetimes to UTC → Socorro - fix all timestamps to be timezone aware
I'd like to change date_processed, started_datetime and completed_datetime to UTC. build_date is based on the string in the 'BuildId', so I don't know if a timezone is even appropriate for it. The client_crash_date comes from the user as an integer. It is then converted into an actual datetime like this: crash_date = datetime.datetime.fromtimestamp(crash_time, Processor.utctz) The application of the UTC timezone is a legacy from the very beginning of this codebase. Is it really appropriate? We'll have to find out.
You're not quite following me. If we change these to timestamptz, it doesn't *matter* what time zone they are "stored" in. They can be retrieved in any time zone. Do note that due to the sheer number of timestamps in the database, fixing this will be a major undertaking to do without downtime.
as part of this bug, I think it is important that we audit and document the source, storage and use of every datetime in the system. We're likely to find a few abominations, but we'll deal with them. If we decide that some changes would be too disruptive, then we'll back off on implementing the changes.
All timestamps from the client are unix time, so they're all UTC by definition, FWIW.
Fixing the timestamps for the future is easy. However, the backfill is quite resource-intensive. It would require rewriting around 300GB of PostgreSQL data, and 20TB of Hbase data in order to correct the timestamps to UTC in the historical data. Doing this rewrite in PostgreSQL would require either a full 24 hours of downtime, or a spare production-quality server and a couple hours of downtime. It would also be helpful to complete getting rid of the Frames table first. Not correcting them is not really any option though. If we tried to say "all timestamps after 2/1/2012 are UTC but before that they are Pacific" there would be no end to the troubleshooting. Finally, we'd need to audit all the SQL logic to take out the assumptions of PST or from-PST conversions.
I really don't think it's that important to get the historical data correct: accuracy isn't extremely important here, and although we might have a day which is odd because of the switchover I don't think we should care that much.
Benjamin, You're saying that you'd be fine with all historical data displaying as 7 or 8 hours off?
Component: Socorro → General
Product: Webtools → Socorro
I think so. Why would anyone really care?
Benjamin, Well, a couple of reasons: 1) all historical data will be 7-8 hours wrong. 2) The day of transition will have a 7-hour gap, which will make that day useless for statistical purposes. I personally don't think either of the above issues are prohibitive. But I want everyone to be 100% clear on what will happen.
As long as everybody is advised of the switch, it's ok to do this (as per comment 13) IMO. We should aim for the transition day to be a non-release day, ideally in a low-traffic period such as on a weekend.
I think ideally we should target the middle two weeks of a Firefox release with the switch, as we have the least pressure on delivering data in that period.
Planning on doing this sometime next week as that's the middle of the cycle.
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/0225412d9f470aeb539d2cf726871b105ad272d1 Merge pull request #258 from twobraids/bug537003-utc-lib Bug537003 utc lib
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/e10a7669464bcee28fee3e13d25879af4821877e Merge pull request #259 from twobraids/bug537003-utc-collector-storage Bug537003 utc collector storage montor processor
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/940d8cae30364588427114813517533f5412c380 Merge pull request #260 from rhelmer/bug537003-timezone-integration Bug537003 timezone integration
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/3978c8682b2b92c15aa2a4d10fcfbd82c0c95ba8 Merge pull request #262 from twobraids/bug537003-utc-services Bug537003 utc services
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/9006b5fbef3ad85aca9f62a657ec7a9c8d8e3efe Merge pull request #263 from twobraids/bug537003-utc-global Bug537003 utc global
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/9006b5fbef3ad85aca9f62a657ec7a9c8d8e3efe Merge pull request #263 from twobraids/bug537003-utc-global https://github.com/mozilla/socorro/commit/39b08a52afcda5c67df6e023305a0df4c68003af Merge pull request #265 from mozilla/bug537003-timezone-integration Bug537003 timezone integration
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/eb8c34f542434d7558cf45535889682573e90c97 Merge pull request #266 from rhelmer/bug537003-timezone-integration remove errant second T, stick to ISO 8601
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
QA verified. Associated dependent bugs verified. Automation and manual bfts pass.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.