Expose application build date in client_info
Categories
(Data Platform and Tools :: Glean: SDK, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: Alekhya, Assigned: janerik)
References
(Blocks 2 open bugs)
Details
User Story
This request is to add datetime to the glean pings. More context is available at the link : https://github.com/mozilla/glam/issues/1105
Attachments
(4 files, 1 obsolete file)
This request is to add datetime to the glean pings.
More context is available at the link :
https://github.com/mozilla/glam/issues/1105
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
We currently don't have the capacity to really do this.
It's also not 100% clear if Glean should do it, so this at least requires a discussion.
Reporter | ||
Comment 2•3 years ago
•
|
||
The options that were discussed in the glean meeting
- Expose application build date in client_info (@janerik had thoughts about the design of such a field)
- Create some kind of mapping from internal iOS build ids to dates (might require cooperation from that team
- Check at the first time a build identifier was seen, then create a date mapping from that.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
So I had some concerns about that build date metric, listing them out below.
String or datetime?
Passing in a string is probably easy to do in just about any build system. But then we can't properly enforce that it's the right format.
Datetime is less straight forward. The API is reasonably simple using the language's datetime equivalent, but will require a bit more sophisticated build system work than just export BUILD_DATE=$(date)
.
(And: minute or second precision?)
Ping size increase
It's a constant overhead in each and every ping.
Probably negligible, given all the other stuff that's already in there.
One way to reduce the impact: The metrics ping sends a specific reason=upgrade
ping once it detects an upgrade.
We could special-case that metric to only be in those pings (but then Glean internals get more complex to handle this).
same build id, multiple build dates
We inevitably will get weird data.
Different pings with the same build id, but multiple build dates.
Different pings with the same build date, but multiple build ids.
Invalid/missing build dates. Build dates from the future. Builds dates from the past. Build dates from the 30th of February.
The system needs to be robust enough to handle that. We already deal with lots of weird data, we ignore build ids/build dates with a low number of clients.
Optional
If we do this, the pipeline will still need to consider that field optional for a long time, because of all the past clients.
firefox-ios builds on bitrise.
It seems the listed build number there corresponds to the build id the application has.
See e.g. https://app.bitrise.io/build/0049cba1-3121-4f1b-9771-20a5e128355d
Version: 95.0
Build number: 6631
That's one that I also see in Testflight.
Bitrise has an API, so we could script that to backfill old builds with their build dates.
If we had such a service we could go with 2) and have bitrise ping that service with $BITRISE_BUILD_NUMBER
in the payload. Then we wouldn't need to transport that information in pings/
Reporter | ||
Comment 4•3 years ago
•
|
||
Thank you for the comments Jan-Erik
Regarding,
String or datetime?
As far as GLAM is concerned, it expects yyyymmhh format.
I am working on code that will verify that the value of the string is actually valid date, if not, filter the records.
Comment 5•3 years ago
•
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #3)
Passing in a string is probably easy to do in just about any build system. But then we can't properly enforce that it's the right format.
Datetime is less straight forward. The API is reasonably simple using the language's datetime equivalent, but will require a bit more sophisticated build system work than justexport BUILD_DATE=$(date)
.
(And: minute or second precision?)
Yes, some client engineering work would be required to translate a string passed in from the build system to a proper date time. I think this should be relatively manageable and there's a couple of reasons to prefer a proper datetime over a string:
- Can enforce the type, etc. (as you noted)
- Downstream tooling (e.g. BigQuery, Looker) will automatically pick up the type which will make querying it easier.
Ping size increase
It's a constant overhead in each and every ping.
Probably negligible, given all the other stuff that's already in there.
One way to reduce the impact: The metrics ping sends a specificreason=upgrade
ping once it detects an upgrade.
We could special-case that metric to only be in those pings (but then Glean internals get more complex to handle this).
Yeah IMO it should be in every ping, as the whole idea here is to make many types of common analysis easier. If you need to jump through a bunch of hoops to get this measure, it's less useful.
same build id, multiple build dates
We inevitably will get weird data.
Different pings with the same build id, but multiple build dates.
Different pings with the same build date, but multiple build ids.
Invalid/missing build dates. Build dates from the future. Builds dates from the past. Build dates from the 30th of February.
The system needs to be robust enough to handle that. We already deal with lots of weird data, we ignore build ids/build dates with a low number of clients.
Yes, this isn't a magical solution. All the analysis gotchas still apply.
Optional
If we do this, the pipeline will still need to consider that field optional for a long time, because of all the past clients.
IMO that's ok, especially for something like GLAM whose purpose is to measure the behaviour of clients running newer versions.
firefox-ios builds on bitrise.
It seems the listed build number there corresponds to the build id the application has.
See e.g. https://app.bitrise.io/build/0049cba1-3121-4f1b-9771-20a5e128355dVersion: 95.0
Build number: 6631That's one that I also see in Testflight.
Bitrise has an API, so we could script that to backfill old builds with their build dates.If we had such a service we could go with 2) and have bitrise ping that service with
$BITRISE_BUILD_NUMBER
in the payload. Then we wouldn't need to transport that information in pings/
That might work for this particular application, but it seems more brittle than just including the information in the ping and I worry about having to solve this problem repeatedly in the future. If we can count on the datetime appearing in the metrics ping, we'll have a system which "just works" for GLAM for all future applications using the Glean SDK.
Assignee | ||
Comment 6•3 years ago
|
||
Thanks, :wlach for that input.
Given that glean_parser runs on all builds, and especially release builds will be from a clean slate, we could do all that work within glean_parser.
For Android we already generate a BuildInfo
struct to pass along the version.
We can extend that with a date. And also generate it for Swift.
It would then have the date & time when the glean_parser ran.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Alekhya and I agreed to turn this into a small proposal, both for documentation and to clear up any remaining questions.
Assignee | ||
Comment 8•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Well, that didn't work.
So let's do it as a comment instead:
Proposal: Expose application build date in client_info for iOS
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Seems like a solid proposal. I wonder a bit whether this throws a monkey wrench into "reproducible builds", since every rebuild will generate a different binary. Using the timestamp of the HEAD commit wouldn't have this problem. Should we confirm with buildeng that this won't be an issue first?
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Read and gave feedback I had to the proposal. General +1 to adding the field.
Comment 12•3 years ago
|
||
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
Assignee | ||
Comment 15•3 years ago
|
||
No further comments were made on the proposal.
I made minor adjustements today, but they don't change it significantly.
The proposal is therefore accepted.
Proposal document: https://docs.google.com/document/d/1_7kTePQHHRhsAqOYPiw8ptoN9ytRnsWMcN-tddnV0Cg/edit#
(document is public-readable now).
The above PRs implement the spec.
Assignee | ||
Comment 16•3 years ago
|
||
Comment 17•3 years ago
|
||
Comment on attachment 9258020 [details]
1742448-data-review.txt
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Jan-Erik Rediger is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
Assignee | ||
Comment 18•3 years ago
|
||
Assignee | ||
Comment 19•3 years ago
|
||
Assignee | ||
Comment 20•3 years ago
|
||
Assignee | ||
Comment 21•3 years ago
|
||
Release coming: https://github.com/mozilla/glean/releases/tag/v43.0.0.
Closing this bug in favor of a followup to bring it to iOS.
Description
•