Events have negative timestamps
Categories
(Data Platform and Tools :: Glean: SDK, defect, P3)
Tracking
(Not tracked)
People
(Reporter: mdroettboom, Assigned: mdroettboom)
References
Details
(Whiteboard: [telemetry:glean-rs:m?])
Attachments
(2 files)
Pings are being schema rejected by with the following:
org.everit.json.schema.ValidationException: #/events/1/timestamp: -3793 is not greater or equal to 0
This might be because events with two different timing sources are ending up in the events queue, or merely that events are being recorded out of order.
One possible fix might just be to truncate negative values to 0.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
I did an analysis of the pings that are rejected due to negative event timestamps. On today's set of 457 pings, 416 (91%) of them follow the same pattern:
The first and last event in the events array are identical. The events in the middle are all in sequential order (though negative due to the way event timestamps are offset relative to the first event). This makes me think that the duplicate event at the beginning and end really belongs only at the end, but somehow there is a race condition where it is ending up in both places.
Assignee | ||
Comment 2•6 years ago
•
|
||
Glean currently writes events to append-only files on disk so that if the application is killed, we can sent the events immediately on startup. Since this "check for events and send" at startup does I/O, it is run on the same single-threaded dispatching queue as all other Glean work at startup. Treating it this way worked correctly until "pre-initialization queuing" was added. Now we have the following scenario:
- Run #1:
- Glean is initialized
- Record event A
- File: [A], Memory [A]
- Record event B
- File [A, B], Memory [A, B]
- app is killed ungracefully by the OS
- Run #2:
- Record event C (probably something early like AppOpened)
- File: [A, B, C], Memory [C]
- Glean is initialized -- this causes the loading of pre-existing events from disk without clearing any events in memory
- File: [A, B, C], Memory [C, A, B, C]
- Record event C (probably something early like AppOpened)
Since when sending events, all timestamps are adjusted to be relative to the first event, events A and B will have negative timestamps, and C is duplicated (both with a timestamp of 0).
Most of the time, Run #1 and Run #2 are on the same boot of the device, so they at least use the same monotonic timer (milliseconds since boot). However, sometimes Run #1 is killed and the device is rebooted before Run #2 (such as in a battery drain scenario). In that case, the same duplicate event problem happens but we may not get negative timestamps, since the events were recorded on different monotonic timers.
This can probably be resolved by moving the "load and send events" work entirely before playing back any pre-initialization tasks in the task queue. This is tricky because "load and send" ends up spawning additional tasks in order to perform the sending, and all of those would need to complete before playing back any of the pre-initialization-queued tasks.
Here's a unit test to reproduce the issue: https://github.com/mdboom/android-components/pull/new/duplicate-events
Assignee | ||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
This is another brilliant piece of work. I agree that this definitely needs to be addressed and I've left some comments regarding your implementation in the GitHub PR here.
Comment 5•6 years ago
|
||
(In reply to Travis Long from comment #4)
This is another brilliant piece of work. I agree that this definitely needs to be addressed and I've left some comments regarding your implementation in the GitHub PR here.
100% agree :)
Assignee | ||
Comment 6•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment hidden (collapsed) |
Description
•