Closed Bug 1591207 Opened 5 years ago Closed 5 years ago

Get Darwin OS version in the main ping and clients_daily

Categories

(Data Platform and Tools :: General, enhancement)

Desktop
macOS
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wbeard, Assigned: benwu)

Details

Attachments

(2 files)

The normalized_os_version field in clients daily gives version numbers like 18.6.0 (and several field in the main ping that correspond to this). In socorro_crash there's an os version field (json_dump.system_info.os_ver) that gives version numbers like 10.5.0 for Darwin. I believe the numbers in clients_daily are for the Darwin kernel version, and those in socorro are for the actual Mac OS version.

So basically
Darwin kernel version ~ 18.6.0, 19.0.0, ...

  • main.environment.system.os
  • main.environment.system.os.version
  • main.normalized_os_version

Mac OS version ~ 10.5, 10.6, ...

  • socorro_crash.json_dump.system_info.os_ver

My workaround now is to look up the version correspondence on wikipedia, but is there a way we could get the OS version in the main ping/client's daily as well?

Windows has this problem as well. os.name would be Windows_NT and os.version would be 6.1 for Windows 7. I keep wikipedia handy for looking them up.

Of course, Microsoft did us a service by making everything 10 forever, and for aligning the NT kernel version to 10.0 at the same time (they could've used 7.0 and really confused things), so things have gotten easier... but I digress.


Chris, do you know anyone from the crash reporter side who could explain where that information comes from? To my knowledge we report os.version this way because 1) This was more precise for perf/stability stuff 2) these kernel versions are the only information we have on the client. Windows 7 doesn't know it's 7, it only knows it's 6.1.

Also: is this the sort of mapping we could build into a UDF or as part of the normalization step for normalized_os_version?

Flags: needinfo?(wbeard)

Chris, do you know anyone from the crash reporter side who could explain where that information comes from?

Gabriele, do you know where telemetry for socorro_crash gets json_dump.system_info.os_ver from?

Flags: needinfo?(wbeard) → needinfo?(gsvelto)

There's multiple ways to get that. The crash reporter does it via Breakpad, this is the relevant C++ code:

https://searchfox.org/mozilla-central/rev/2a355e56c490768be676689de18629485c9d699b/toolkit/crashreporter/breakpad-client/mac/handler/minidump_generator.cc#128-191

Note that it also grabs the buildid, not just the version.

The widget does it in a less contrived way by using Objective-C code:

https://searchfox.org/mozilla-central/rev/2a355e56c490768be676689de18629485c9d699b/widget/cocoa/nsCocoaFeatures.mm#60-77

Flags: needinfo?(gsvelto)

Not sure who on the Data Engineering and Data Science side I should bring in here. Mark, what do you think about the appropriate place to put this sort of information? Is a udf like I mentioned earlier a sensible place? Or should I stop stalling and triage this as a client-side Telemetry bug after all?

Flags: needinfo?(mreid)

I am in favour of keeping a lookup table to map the kernel version (or windows version details) to meaningful names. Ideally we can automate the retrieval of such information - do you know of a canonical source for this kind of info that is machine-readable?

We can keep this information in the bigquery-etl repository as a slowly-changing "static" table for now, and potentially create a small UDF to more conveniently map normalized_os_version -> human-meaningful names.

Flags: needinfo?(mreid)

Unfortunately I don't know of any machine-readable info of that nature (and a quick Google turns up nothing). Heck, we only recently built buildhub to do the same thing for our builds, so I guess I shouldn't be surprised that OSs are a little behind.

I can imagine we might want to do the same thing for Android API Level -> Android OS number. And maybe there's a similar thingy for iOS (I don't know)

Given the speed at which new OSs are released (slow) it might be manageable to do this manually. What do you think?

Component: Telemetry → Datasets: General
Flags: needinfo?(mreid)
Product: Toolkit → Data Platform and Tools

Sounds good, let's try it as a manual lookup table first.

Would a table of (name, version, display_name, release_date) suffice? Example rows would be:

Darwin, 17.6.0, macOS 10.13.5, 2018-06-01
Windows_NT, 6.1, Windows 7, 2009-07-22
Flags: needinfo?(mreid) → needinfo?(chutten)

Works for me. What say you, Mr Beard?

Flags: needinfo?(chutten) → needinfo?(wbeard)

It seems it could be useful to not have to parse out the macOS if I'm just wanting the 10.13.5 number. Though I might need to do parsing to extract the numerical components anyways if I want to do things like sorting, so it may not buy much...

Is the reason for keeping macOS together with 10.13.5 in the display_name so as not to get the 2 kinds of versions mixed up?

I can't think of anything better, so if there's not a better way around the ambiguity then this spec looks fine to me.

Some convention would be nice though, so that breaking out the name from the number (macOS vs 10.13.5, or Windows vs 7) would only require splitting on a space.

Flags: needinfo?(wbeard) → needinfo?(mreid)

I like this idea. Would it make sense to call these display_version and display_family?

New examples:

name, version, display_name, display_family, display_version, release_date
Darwin, 17.6.0, macOS 10.13.5, macOS, 10.13.5, 2018-06-01
Windows_NT, 6.1, Windows 7, Windows, 7, 2009-07-22
Flags: needinfo?(mreid) → needinfo?(wbeard)

I'm not familiar with the connotations/conventions around 'family' as it relates to OS versioning, but this format makes sense to me (and I can't think of a better alternative).

I think display_version also makes sense given other contexts I've seen in our telemetry.

Flags: needinfo?(wbeard)

There is some precedent for this with country codes in the static dataset: https://github.com/mozilla/bigquery-etl/tree/master/templates/static

This keeps everything in a csv and gets loaded into bigquery using a script but I think that's a little too manual. If we create an os lookup as a view it would automatically update when we do our view deploys. I'll take a stab at that and report back.

Assignee: nobody → bewu

I created tables normalized_os_name and normalized_os_version in the static dataset in derived datasets and shared prod. To get the normalized os version you would need to do something like

...
FROM
  `moz-fx-data-shared-prod.telemetry_stable.main_v4`
LEFT JOIN
  `static.normalized_os_version` norm_os
ON
  (environment.system.os.name = norm_os.os_name
  AND REGEXP_CONTAINS(environment.system.os.version, norm_os.os_version))

This is a bit awkward because UDFs don't allow reading from a table. Chris, is this way good enough for your use case?

Also the plan is to keep the table as a csv in bigquery-etl so it will be easy to change.

Flags: needinfo?(wbeard)

Very nifty, LGTM. Thanks!

Flags: needinfo?(wbeard)

Changes have been merged and table data is stored here: https://github.com/mozilla/bigquery-etl/tree/master/templates/static

The os tables are static.normalized_os_version and static.normalized_os_name.

An example query for os version can be found here: https://sql.telemetry.mozilla.org/queries/67040/source

This is great! Can you add an article / example on docs.tmo too?

Flags: needinfo?(bewu)
Flags: needinfo?(bewu)

Thanks!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: