Closed Bug 1742448 Opened 3 years ago Closed 2 years ago

Expose application build date in client_info

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Alekhya, Assigned: janerik)

References

(Blocks 2 open bugs)

Details

User Story

This request is to add datetime to the glean pings.
More context is available at the link :
https://github.com/mozilla/glam/issues/1105

Attachments

(4 files, 1 obsolete file)

This request is to add datetime to the glean pings.
More context is available at the link :
https://github.com/mozilla/glam/issues/1105

User Story: (updated)
Summary: Implement getting the “build-date” into all Glean pings → Implement - getting the build datetime into all Glean pings
Type: defect → enhancement
Priority: -- → P4
Summary: Implement - getting the build datetime into all Glean pings → Expose application build date in client_info
Whiteboard: [telemetry:glean-rs:m?]

We currently don't have the capacity to really do this.
It's also not 100% clear if Glean should do it, so this at least requires a discussion.

Priority: P4 → P3
Whiteboard: [telemetry:glean-rs:m?]

The options that were discussed in the glean meeting

  1. Expose application build date in client_info (@janerik had thoughts about the design of such a field)
  2. Create some kind of mapping from internal iOS build ids to dates (might require cooperation from that team
  3. Check at the first time a build identifier was seen, then create a date mapping from that.
Flags: needinfo?(jrediger)
Flags: needinfo?(jrediger)
Flags: needinfo?(jrediger)

So I had some concerns about that build date metric, listing them out below.

String or datetime?

Passing in a string is probably easy to do in just about any build system. But then we can't properly enforce that it's the right format.
Datetime is less straight forward. The API is reasonably simple using the language's datetime equivalent, but will require a bit more sophisticated build system work than just export BUILD_DATE=$(date).
(And: minute or second precision?)

Ping size increase

It's a constant overhead in each and every ping.
Probably negligible, given all the other stuff that's already in there.
One way to reduce the impact: The metrics ping sends a specific reason=upgrade ping once it detects an upgrade.
We could special-case that metric to only be in those pings (but then Glean internals get more complex to handle this).

same build id, multiple build dates

We inevitably will get weird data.
Different pings with the same build id, but multiple build dates.
Different pings with the same build date, but multiple build ids.
Invalid/missing build dates. Build dates from the future. Builds dates from the past. Build dates from the 30th of February.
The system needs to be robust enough to handle that. We already deal with lots of weird data, we ignore build ids/build dates with a low number of clients.

Optional

If we do this, the pipeline will still need to consider that field optional for a long time, because of all the past clients.


firefox-ios builds on bitrise.
It seems the listed build number there corresponds to the build id the application has.
See e.g. https://app.bitrise.io/build/0049cba1-3121-4f1b-9771-20a5e128355d

Version: 95.0
Build number: 6631

That's one that I also see in Testflight.
Bitrise has an API, so we could script that to backfill old builds with their build dates.

If we had such a service we could go with 2) and have bitrise ping that service with $BITRISE_BUILD_NUMBER in the payload. Then we wouldn't need to transport that information in pings/

Flags: needinfo?(jrediger)

Thank you for the comments Jan-Erik
Regarding,
String or datetime?
As far as GLAM is concerned, it expects yyyymmhh format.
I am working on code that will verify that the value of the string is actually valid date, if not, filter the records.

(In reply to Jan-Erik Rediger [:janerik] from comment #3)

Passing in a string is probably easy to do in just about any build system. But then we can't properly enforce that it's the right format.
Datetime is less straight forward. The API is reasonably simple using the language's datetime equivalent, but will require a bit more sophisticated build system work than just export BUILD_DATE=$(date).
(And: minute or second precision?)

Yes, some client engineering work would be required to translate a string passed in from the build system to a proper date time. I think this should be relatively manageable and there's a couple of reasons to prefer a proper datetime over a string:

  1. Can enforce the type, etc. (as you noted)
  2. Downstream tooling (e.g. BigQuery, Looker) will automatically pick up the type which will make querying it easier.

Ping size increase

It's a constant overhead in each and every ping.
Probably negligible, given all the other stuff that's already in there.
One way to reduce the impact: The metrics ping sends a specific reason=upgrade ping once it detects an upgrade.
We could special-case that metric to only be in those pings (but then Glean internals get more complex to handle this).

Yeah IMO it should be in every ping, as the whole idea here is to make many types of common analysis easier. If you need to jump through a bunch of hoops to get this measure, it's less useful.

same build id, multiple build dates

We inevitably will get weird data.
Different pings with the same build id, but multiple build dates.
Different pings with the same build date, but multiple build ids.
Invalid/missing build dates. Build dates from the future. Builds dates from the past. Build dates from the 30th of February.
The system needs to be robust enough to handle that. We already deal with lots of weird data, we ignore build ids/build dates with a low number of clients.

Yes, this isn't a magical solution. All the analysis gotchas still apply.

Optional

If we do this, the pipeline will still need to consider that field optional for a long time, because of all the past clients.

IMO that's ok, especially for something like GLAM whose purpose is to measure the behaviour of clients running newer versions.


firefox-ios builds on bitrise.
It seems the listed build number there corresponds to the build id the application has.
See e.g. https://app.bitrise.io/build/0049cba1-3121-4f1b-9771-20a5e128355d

Version: 95.0
Build number: 6631

That's one that I also see in Testflight.
Bitrise has an API, so we could script that to backfill old builds with their build dates.

If we had such a service we could go with 2) and have bitrise ping that service with $BITRISE_BUILD_NUMBER in the payload. Then we wouldn't need to transport that information in pings/

That might work for this particular application, but it seems more brittle than just including the information in the ping and I worry about having to solve this problem repeatedly in the future. If we can count on the datetime appearing in the metrics ping, we'll have a system which "just works" for GLAM for all future applications using the Glean SDK.

Thanks, :wlach for that input.


Given that glean_parser runs on all builds, and especially release builds will be from a clean slate, we could do all that work within glean_parser.
For Android we already generate a BuildInfo struct to pass along the version.
We can extend that with a date. And also generate it for Swift.
It would then have the date & time when the glean_parser ran.

Assignee: nobody → jrediger
Priority: P3 → P2

Alekhya and I agreed to turn this into a small proposal, both for documentation and to clear up any remaining questions.

Attachment #9254507 - Flags: feedback?(rmiller)
Attachment #9254507 - Flags: feedback?(mdroettboom)
Attachment #9254507 - Flags: feedback?(brizental)
Attachment #9254507 - Attachment is obsolete: true
Attachment #9254507 - Flags: feedback?(rmiller)
Attachment #9254507 - Flags: feedback?(mdroettboom)
Attachment #9254507 - Flags: feedback?(brizental)

Well, that didn't work.
So let's do it as a comment instead:

Proposal: Expose application build date in client_info for iOS

Flags: needinfo?(wlachance)
Flags: needinfo?(rmiller)
Flags: needinfo?(mdroettboom)
Flags: needinfo?(brizental)
Flags: needinfo?(brizental)

Seems like a solid proposal. I wonder a bit whether this throws a monkey wrench into "reproducible builds", since every rebuild will generate a different binary. Using the timestamp of the HEAD commit wouldn't have this problem. Should we confirm with buildeng that this won't be an issue first?

Flags: needinfo?(mdroettboom)
Flags: needinfo?(wlachance)

Read and gave feedback I had to the proposal. General +1 to adding the field.

Flags: needinfo?(rmiller)

No further comments were made on the proposal.
I made minor adjustements today, but they don't change it significantly.

The proposal is therefore accepted.
Proposal document: https://docs.google.com/document/d/1_7kTePQHHRhsAqOYPiw8ptoN9ytRnsWMcN-tddnV0Cg/edit#
(document is public-readable now).

The above PRs implement the spec.

Attachment #9258020 - Flags: data-review?(chutten)

Comment on attachment 9258020 [details]
1742448-data-review.txt

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Jan-Erik Rediger is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9258020 - Flags: data-review?(chutten) → data-review+
Blocks: 1749493
Blocks: 1749494
Blocks: 1749495

Release coming: https://github.com/mozilla/glean/releases/tag/v43.0.0.
Closing this bug in favor of a followup to bring it to iOS.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Blocks: 1750524
Blocks: 1750544
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: