Time in logs is wrong with mozpool

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: Callek, Assigned: dividehex)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
So I was just looking at some mozpool logs to try and figure something out again.... When I see "future" lines...

Note how we have 23:29 of Sept 13, then mixed inside that section 06:30 of Sept 14. We need to fix this (http://mobile-imaging-001.p1.releng.scl1.mozilla.com/ui/log.html?device=panda-0762)

2013-09-13T23:29:05 statemachine setting next image to android, boot config to {}
2013-09-13T23:29:05 statemachine entering state start_pxe_boot
2013-09-13T23:29:05 statemachine entering state pxe_power_cycling
2013-09-13T23:29:05 statemachine setting PXE config to panda-android.2
2013-09-13T23:29:05 statemachine initiating power cycle
2013-09-13T23:29:08 statemachine entering state pxe_booting
2013-09-14T06:30:26 syslog mobile-init.sh: mobile-init.sh starting.
2013-09-14T06:30:26 syslog mobile-init.sh: Reading boot args.
2013-09-14T06:30:26 syslog mobile-init.sh: second stage URL found: http://10.12.135.33/scripts/android-second-stage.sh
2013-09-14T06:30:26 syslog mobile-init.sh: syslog/imaging server IP found: 10.12.135.33
2013-09-14T06:30:26 syslog mobile-init.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/mobile_init_started/
2013-09-13T23:30:26 statemachine entering state mobile_init_started
2013-09-14T06:30:26 syslog mobile-init.sh: wget success: --2013-09-13 23:30:26-- http://10.12.135.33/scripts/android-second-stage.sh#012Connecting to 10.12.135.33:80... connected.#012HTTP request sent, awaiting response... 200 OK#012Length: 2370 (2.3K) [application/x-sh]#012Saving to: `/opt/scripts/second-stage.sh'#012#012 0K .. 100% 14.8M=0s#012#0122013-09-13 23:30:26 (14.8 MB/s) - `/opt/scripts/second-stage.sh' saved [2370/2370]
2013-09-14T06:30:26 syslog mobile-init.sh: Executing second-stage.sh
2013-09-14T06:30:26 syslog second-stage.sh: beginning android-second-stage.sh
2013-09-14T06:30:26 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_downloading/
2013-09-13T23:30:26 statemachine entering state android_downloading
2013-09-14T06:30:26 syslog second-stage.sh: got android image panda-android.2
2013-09-14T06:30:26 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/system.tar.bz2
2013-09-14T06:30:39 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/userdata.tar.bz2
2013-09-14T06:30:41 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/boot.tar.bz2
2013-09-14T06:30:41 syslog second-stage.sh: formatting system partition
2013-09-14T06:30:43 syslog second-stage.sh: formatting cache partition
2013-09-14T06:30:44 syslog second-stage.sh: formatting userdata partition
2013-09-14T06:30:45 syslog second-stage.sh: formatting media partition
2013-09-14T06:30:46 syslog second-stage.sh: mounting boot partition
2013-09-14T06:30:46 syslog second-stage.sh: mounting system partition
2013-09-14T06:30:46 syslog second-stage.sh: mounting userdata partition
2013-09-14T06:30:46 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_extracting/
2013-09-13T23:30:47 statemachine entering state android_extracting
2013-09-14T06:30:47 syslog second-stage.sh: extracting boot artifact
2013-09-14T06:30:50 syslog second-stage.sh: extracting system artifact
2013-09-14T06:32:34 syslog second-stage.sh: extracting userdata artifact
2013-09-14T06:32:47 syslog second-stage.sh: unmounting partitions
2013-09-14T06:32:52 syslog second-stage.sh: imaging complete. Rebooting
2013-09-14T06:32:52 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_rebooting/
2013-09-13T23:32:52 statemachine entering state sut_verifying
2013-09-13T23:32:52 sut connecting to SUT agent
2013-09-13T23:33:40 sut connecting to SUT agent
2013-09-13T23:34:30 sut connecting to SUT agent
2013-09-13T23:35:20 sut connecting to SUT agent
2013-09-13T23:36:10 sut connecting to SUT agent
2013-09-13T23:36:11 statemachine entering state sut_sdcard_verifying
2013-09-13T23:36:11 sut verifying SD card
2013-09-13T23:36:12 statemachine entering state operation_complete
2013-09-13T23:36:12 statemachine sending imaging result 'complete' to Mozpool
2013-09-13T23:36:12 statemachine entering state ready
Assignee: relops → jwatkins
(Assignee)

Comment 1

4 years ago
This is started when we moved the mozpool DB off the buildbot cluster and onto the generic cluster.  Mysql (or MariaDB in this case) picks up the timezone from the local system and uses that when resolving the time in the NOW() function.  And since generic1.db.scl3.m.c is set to UTC, the syslog msgs that the mobile-imaging servers are sending directly to the DB are being timestamped as such.

There are a couple ways to bandaid this but in the long run, I think we should really consider
1) unifying all timestamps in mozpool as UTC
2) logging timestamps at a higher resolution
3) possibly replacing the rsyslog udp listener with a python implementation driven by mozpool itself

To quick fix this, we can change the insert_device_log_entry procedure to use the source timestamp sent by the pandaboards as it originally was.  The squashfs live boot image was fixed awhile back to set its time properly to ntp and US/Pacific as the TZ

:dustin, thoughts?
Nice detective work, and sounds good to me!
(Assignee)

Comment 3

4 years ago
Created attachment 806294 [details] [diff] [review]
revert-816732.patch

This reverts insert_device_log_entry to use the provided syslog timestamp rather then NOW().   I'll exec this manually on the statging and then production db also upon r+
Attachment #806294 - Flags: review?(dustin)
(Assignee)

Comment 4

4 years ago
After talking to :dustin via IRC, I updated and test the reverted procedure call on staging mozpool instance.  Although mozpool stage didn't suffer from the time offset issue since staging.db.scl3 is set to us/pacific, I didn't see an change from the update.
(Assignee)

Comment 5

4 years ago
Production mozpool has been updated with the reverted procedure and it looks like timestamps are no longer offset.  Also, filed bug917597 for a better long term solution.
Comment on attachment 806294 [details] [diff] [review]
revert-816732.patch

What confused me initially about this patch is it reverts to using the device's timestamp with a fallback to NOW() if that timestamp is so far off that it points to a nonexistent partition (so several days in the future or a few weeks in the past).
Attachment #806294 - Flags: review?(dustin) → review+
(Assignee)

Comment 7

4 years ago
schema.sql has been updated and merged
https://github.com/mozilla/mozpool/commit/c2311a76215bad691c6e7f9ceafc6f3908320aff
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.