Get Darwin OS version in the main ping and clients_daily
Categories
(Data Platform and Tools :: General, enhancement)
Tracking
(Not tracked)
People
(Reporter: wbeard, Assigned: benwu)
Details
Attachments
(2 files)
The normalized_os_version field in clients daily gives version numbers like 18.6.0 (and several field in the main ping that correspond to this). In socorro_crash
there's an os version field (json_dump.system_info.os_ver) that gives version numbers like 10.5.0 for Darwin. I believe the numbers in clients_daily are for the Darwin kernel version, and those in socorro are for the actual Mac OS version.
So basically
Darwin kernel version ~ 18.6.0, 19.0.0, ...
- main.environment.system.os
- main.environment.system.os.version
- main.normalized_os_version
Mac OS version ~ 10.5, 10.6, ...
- socorro_crash.json_dump.system_info.os_ver
My workaround now is to look up the version correspondence on wikipedia, but is there a way we could get the OS version in the main ping/client's daily as well?
Comment 1•5 years ago
|
||
Windows has this problem as well. os.name
would be Windows_NT
and os.version
would be 6.1
for Windows 7
. I keep wikipedia handy for looking them up.
Of course, Microsoft did us a service by making everything 10 forever, and for aligning the NT kernel version to 10.0
at the same time (they could've used 7.0
and really confused things), so things have gotten easier... but I digress.
Chris, do you know anyone from the crash reporter side who could explain where that information comes from? To my knowledge we report os.version
this way because 1) This was more precise for perf/stability stuff 2) these kernel versions are the only information we have on the client. Windows 7 doesn't know it's 7, it only knows it's 6.1.
Also: is this the sort of mapping we could build into a UDF or as part of the normalization step for normalized_os_version
?
Reporter | ||
Comment 2•5 years ago
|
||
Chris, do you know anyone from the crash reporter side who could explain where that information comes from?
Gabriele, do you know where telemetry for socorro_crash gets json_dump.system_info.os_ver
from?
Comment 3•5 years ago
|
||
There's multiple ways to get that. The crash reporter does it via Breakpad, this is the relevant C++ code:
Note that it also grabs the buildid, not just the version.
The widget does it in a less contrived way by using Objective-C code:
Comment 4•5 years ago
|
||
Not sure who on the Data Engineering and Data Science side I should bring in here. Mark, what do you think about the appropriate place to put this sort of information? Is a udf
like I mentioned earlier a sensible place? Or should I stop stalling and triage this as a client-side Telemetry bug after all?
Comment 5•5 years ago
|
||
I am in favour of keeping a lookup table to map the kernel version (or windows version details) to meaningful names. Ideally we can automate the retrieval of such information - do you know of a canonical source for this kind of info that is machine-readable?
We can keep this information in the bigquery-etl repository as a slowly-changing "static" table for now, and potentially create a small UDF to more conveniently map normalized_os_version -> human-meaningful names.
Comment 6•5 years ago
|
||
Unfortunately I don't know of any machine-readable info of that nature (and a quick Google turns up nothing). Heck, we only recently built buildhub to do the same thing for our builds, so I guess I shouldn't be surprised that OSs are a little behind.
I can imagine we might want to do the same thing for Android API Level -> Android OS number. And maybe there's a similar thingy for iOS (I don't know)
Given the speed at which new OSs are released (slow) it might be manageable to do this manually. What do you think?
Comment 7•5 years ago
|
||
Sounds good, let's try it as a manual lookup table first.
Would a table of (name, version, display_name, release_date) suffice? Example rows would be:
Darwin, 17.6.0, macOS 10.13.5, 2018-06-01
Windows_NT, 6.1, Windows 7, 2009-07-22
Comment 8•5 years ago
|
||
Works for me. What say you, Mr Beard?
Reporter | ||
Comment 9•5 years ago
|
||
It seems it could be useful to not have to parse out the macOS
if I'm just wanting the 10.13.5
number. Though I might need to do parsing to extract the numerical components anyways if I want to do things like sorting, so it may not buy much...
Is the reason for keeping macOS
together with 10.13.5
in the display_name
so as not to get the 2 kinds of versions mixed up?
I can't think of anything better, so if there's not a better way around the ambiguity then this spec looks fine to me.
Some convention would be nice though, so that breaking out the name from the number (macOS
vs 10.13.5
, or Windows
vs 7
) would only require splitting on a space.
Comment 10•5 years ago
|
||
I like this idea. Would it make sense to call these display_version
and display_family
?
New examples:
name, version, display_name, display_family, display_version, release_date
Darwin, 17.6.0, macOS 10.13.5, macOS, 10.13.5, 2018-06-01
Windows_NT, 6.1, Windows 7, Windows, 7, 2009-07-22
Reporter | ||
Comment 11•5 years ago
|
||
I'm not familiar with the connotations/conventions around 'family' as it relates to OS versioning, but this format makes sense to me (and I can't think of a better alternative).
I think display_version
also makes sense given other contexts I've seen in our telemetry.
Assignee | ||
Comment 12•5 years ago
|
||
There is some precedent for this with country codes in the static dataset: https://github.com/mozilla/bigquery-etl/tree/master/templates/static
This keeps everything in a csv and gets loaded into bigquery using a script but I think that's a little too manual. If we create an os lookup as a view it would automatically update when we do our view deploys. I'll take a stab at that and report back.
Comment 13•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 14•5 years ago
•
|
||
I created tables normalized_os_name
and normalized_os_version
in the static dataset in derived datasets and shared prod. To get the normalized os version you would need to do something like
...
FROM
`moz-fx-data-shared-prod.telemetry_stable.main_v4`
LEFT JOIN
`static.normalized_os_version` norm_os
ON
(environment.system.os.name = norm_os.os_name
AND REGEXP_CONTAINS(environment.system.os.version, norm_os.os_version))
This is a bit awkward because UDFs don't allow reading from a table. Chris, is this way good enough for your use case?
Also the plan is to keep the table as a csv in bigquery-etl
so it will be easy to change.
Comment 16•5 years ago
|
||
Assignee | ||
Comment 17•5 years ago
|
||
Changes have been merged and table data is stored here: https://github.com/mozilla/bigquery-etl/tree/master/templates/static
The os tables are static.normalized_os_version
and static.normalized_os_name
.
An example query for os version can be found here: https://sql.telemetry.mozilla.org/queries/67040/source
Comment 18•5 years ago
|
||
This is great! Can you add an article / example on docs.tmo too?
Assignee | ||
Comment 19•5 years ago
|
||
Added documentation here: https://docs.telemetry.mozilla.org/datasets/static/normalized_os.html
Updated•3 years ago
|
Description
•