Closed Bug 1585021 Opened 6 years ago Closed 6 years ago

Events have negative timestamps

Categories

(Data Platform and Tools :: Glean: SDK, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:glean-rs:m?])

Attachments

(2 files)

Pings are being schema rejected by with the following:

org.everit.json.schema.ValidationException: #/events/1/timestamp: -3793 is not greater or equal to 0

This might be because events with two different timing sources are ending up in the events queue, or merely that events are being recorded out of order.

One possible fix might just be to truncate negative values to 0.

Assignee: nobody → mdroettboom

I did an analysis of the pings that are rejected due to negative event timestamps. On today's set of 457 pings, 416 (91%) of them follow the same pattern:

The first and last event in the events array are identical. The events in the middle are all in sequential order (though negative due to the way event timestamps are offset relative to the first event). This makes me think that the duplicate event at the beginning and end really belongs only at the end, but somehow there is a race condition where it is ending up in both places.

Glean currently writes events to append-only files on disk so that if the application is killed, we can sent the events immediately on startup. Since this "check for events and send" at startup does I/O, it is run on the same single-threaded dispatching queue as all other Glean work at startup. Treating it this way worked correctly until "pre-initialization queuing" was added. Now we have the following scenario:

  • Run #1:
    • Glean is initialized
    • Record event A
      • File: [A], Memory [A]
    • Record event B
      • File [A, B], Memory [A, B]
  • app is killed ungracefully by the OS
  • Run #2:
    • Record event C (probably something early like AppOpened)
      • File: [A, B, C], Memory [C]
    • Glean is initialized -- this causes the loading of pre-existing events from disk without clearing any events in memory
      • File: [A, B, C], Memory [C, A, B, C]

Since when sending events, all timestamps are adjusted to be relative to the first event, events A and B will have negative timestamps, and C is duplicated (both with a timestamp of 0).

Most of the time, Run #1 and Run #2 are on the same boot of the device, so they at least use the same monotonic timer (milliseconds since boot). However, sometimes Run #1 is killed and the device is rebooted before Run #2 (such as in a battery drain scenario). In that case, the same duplicate event problem happens but we may not get negative timestamps, since the events were recorded on different monotonic timers.

This can probably be resolved by moving the "load and send events" work entirely before playing back any pre-initialization tasks in the task queue. This is tricky because "load and send" ends up spawning additional tasks in order to perform the sending, and all of those would need to complete before playing back any of the pre-initialization-queued tasks.

Here's a unit test to reproduce the issue: https://github.com/mdboom/android-components/pull/new/duplicate-events

Flags: needinfo?(tlong)
Flags: needinfo?(alessio.placitelli)

This is another brilliant piece of work. I agree that this definitely needs to be addressed and I've left some comments regarding your implementation in the GitHub PR here.

Flags: needinfo?(tlong)

(In reply to Travis Long from comment #4)

This is another brilliant piece of work. I agree that this definitely needs to be addressed and I've left some comments regarding your implementation in the GitHub PR here.

100% agree :)

Flags: needinfo?(alessio.placitelli)
Blocks: 1585648
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: