Closed
Bug 916414
Opened 11 years ago
Closed 11 years ago
Time in logs is wrong with mozpool
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: dividehex)
Details
Attachments
(1 file)
1003 bytes,
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
So I was just looking at some mozpool logs to try and figure something out again.... When I see "future" lines... Note how we have 23:29 of Sept 13, then mixed inside that section 06:30 of Sept 14. We need to fix this (http://mobile-imaging-001.p1.releng.scl1.mozilla.com/ui/log.html?device=panda-0762) 2013-09-13T23:29:05 statemachine setting next image to android, boot config to {} 2013-09-13T23:29:05 statemachine entering state start_pxe_boot 2013-09-13T23:29:05 statemachine entering state pxe_power_cycling 2013-09-13T23:29:05 statemachine setting PXE config to panda-android.2 2013-09-13T23:29:05 statemachine initiating power cycle 2013-09-13T23:29:08 statemachine entering state pxe_booting 2013-09-14T06:30:26 syslog mobile-init.sh: mobile-init.sh starting. 2013-09-14T06:30:26 syslog mobile-init.sh: Reading boot args. 2013-09-14T06:30:26 syslog mobile-init.sh: second stage URL found: http://10.12.135.33/scripts/android-second-stage.sh 2013-09-14T06:30:26 syslog mobile-init.sh: syslog/imaging server IP found: 10.12.135.33 2013-09-14T06:30:26 syslog mobile-init.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/mobile_init_started/ 2013-09-13T23:30:26 statemachine entering state mobile_init_started 2013-09-14T06:30:26 syslog mobile-init.sh: wget success: --2013-09-13 23:30:26-- http://10.12.135.33/scripts/android-second-stage.sh#012Connecting to 10.12.135.33:80... connected.#012HTTP request sent, awaiting response... 200 OK#012Length: 2370 (2.3K) [application/x-sh]#012Saving to: `/opt/scripts/second-stage.sh'#012#012 0K .. 100% 14.8M=0s#012#0122013-09-13 23:30:26 (14.8 MB/s) - `/opt/scripts/second-stage.sh' saved [2370/2370] 2013-09-14T06:30:26 syslog mobile-init.sh: Executing second-stage.sh 2013-09-14T06:30:26 syslog second-stage.sh: beginning android-second-stage.sh 2013-09-14T06:30:26 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_downloading/ 2013-09-13T23:30:26 statemachine entering state android_downloading 2013-09-14T06:30:26 syslog second-stage.sh: got android image panda-android.2 2013-09-14T06:30:26 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/system.tar.bz2 2013-09-14T06:30:39 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/userdata.tar.bz2 2013-09-14T06:30:41 syslog second-stage.sh: fetching http://10.12.135.33/artifacts/panda-android.2/boot.tar.bz2 2013-09-14T06:30:41 syslog second-stage.sh: formatting system partition 2013-09-14T06:30:43 syslog second-stage.sh: formatting cache partition 2013-09-14T06:30:44 syslog second-stage.sh: formatting userdata partition 2013-09-14T06:30:45 syslog second-stage.sh: formatting media partition 2013-09-14T06:30:46 syslog second-stage.sh: mounting boot partition 2013-09-14T06:30:46 syslog second-stage.sh: mounting system partition 2013-09-14T06:30:46 syslog second-stage.sh: mounting userdata partition 2013-09-14T06:30:46 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_extracting/ 2013-09-13T23:30:47 statemachine entering state android_extracting 2013-09-14T06:30:47 syslog second-stage.sh: extracting boot artifact 2013-09-14T06:30:50 syslog second-stage.sh: extracting system artifact 2013-09-14T06:32:34 syslog second-stage.sh: extracting userdata artifact 2013-09-14T06:32:47 syslog second-stage.sh: unmounting partitions 2013-09-14T06:32:52 syslog second-stage.sh: imaging complete. Rebooting 2013-09-14T06:32:52 syslog second-stage.sh: Submitting lifeguard event at http://10.12.135.33/api/device/panda-0762/event/android_rebooting/ 2013-09-13T23:32:52 statemachine entering state sut_verifying 2013-09-13T23:32:52 sut connecting to SUT agent 2013-09-13T23:33:40 sut connecting to SUT agent 2013-09-13T23:34:30 sut connecting to SUT agent 2013-09-13T23:35:20 sut connecting to SUT agent 2013-09-13T23:36:10 sut connecting to SUT agent 2013-09-13T23:36:11 statemachine entering state sut_sdcard_verifying 2013-09-13T23:36:11 sut verifying SD card 2013-09-13T23:36:12 statemachine entering state operation_complete 2013-09-13T23:36:12 statemachine sending imaging result 'complete' to Mozpool 2013-09-13T23:36:12 statemachine entering state ready
Updated•11 years ago
|
Assignee: relops → jwatkins
Assignee | ||
Comment 1•11 years ago
|
||
This is started when we moved the mozpool DB off the buildbot cluster and onto the generic cluster. Mysql (or MariaDB in this case) picks up the timezone from the local system and uses that when resolving the time in the NOW() function. And since generic1.db.scl3.m.c is set to UTC, the syslog msgs that the mobile-imaging servers are sending directly to the DB are being timestamped as such. There are a couple ways to bandaid this but in the long run, I think we should really consider 1) unifying all timestamps in mozpool as UTC 2) logging timestamps at a higher resolution 3) possibly replacing the rsyslog udp listener with a python implementation driven by mozpool itself To quick fix this, we can change the insert_device_log_entry procedure to use the source timestamp sent by the pandaboards as it originally was. The squashfs live boot image was fixed awhile back to set its time properly to ntp and US/Pacific as the TZ :dustin, thoughts?
Comment 2•11 years ago
|
||
Nice detective work, and sounds good to me!
Assignee | ||
Comment 3•11 years ago
|
||
This reverts insert_device_log_entry to use the provided syslog timestamp rather then NOW(). I'll exec this manually on the statging and then production db also upon r+
Attachment #806294 -
Flags: review?(dustin)
Assignee | ||
Comment 4•11 years ago
|
||
After talking to :dustin via IRC, I updated and test the reverted procedure call on staging mozpool instance. Although mozpool stage didn't suffer from the time offset issue since staging.db.scl3 is set to us/pacific, I didn't see an change from the update.
Assignee | ||
Comment 5•11 years ago
|
||
Production mozpool has been updated with the reverted procedure and it looks like timestamps are no longer offset. Also, filed bug917597 for a better long term solution.
Comment 6•11 years ago
|
||
Comment on attachment 806294 [details] [diff] [review] revert-816732.patch What confused me initially about this patch is it reverts to using the device's timestamp with a fallback to NOW() if that timestamp is so far off that it points to a nonexistent partition (so several days in the future or a few weeks in the past).
Attachment #806294 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 7•11 years ago
|
||
schema.sql has been updated and merged https://github.com/mozilla/mozpool/commit/c2311a76215bad691c6e7f9ceafc6f3908320aff
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•