Closed Bug 1638877 Opened 4 years ago Closed 4 years ago

org.everit.json.schema.ValidationException: #/client_info: required key [app_build]

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: mdroettboom)

References

Details

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/glean/pull/996 4 years ago GitHub Bugzilla PR Linker 41 bytes, text/x-github-pull-request		Details \| Review

Alessio Placitelli [:Dexter]

Reporter

Description

•

4 years ago

org-mozilla-fennec-aurora org.everit.json.schema.ValidationException: #/client_info: required key [app_build] not found baseline 927

We're seeing a bunch of failures like the above across all the app ids. We should figure out why that's happening.

Alessio Placitelli [:Dexter]

Reporter

Updated

•

4 years ago

Priority: -- → P3

Whiteboard: [telemetry:glean-rs:m?]

Jan-Erik Rediger [:janerik]

Updated

•

4 years ago

Priority: P3 → P2

Whiteboard: [telemetry:glean-rs:m?]

Travis Long [:travis_]

Comment 1

•

4 years ago

Maybe this has something to do with the way we are getting this info being deprecated? It says deprecated since API level 28, so I wonder if this is mostly newer devices?

Alessio Placitelli [:Dexter]

Reporter

Comment 2

•

4 years ago

(In reply to Travis Long [:travis_] from comment #1)

Maybe this has something to do with the way we are getting this info being deprecated? It says deprecated since API level 28, so I wonder if this is mostly newer devices?

Maybe, but I'd be very surprised. Deprecated doesn't mean removed :)

Alessio Placitelli [:Dexter]

Reporter

Updated

•

4 years ago

Assignee: nobody → mdroettboom

Priority: P2 → P1

Michael Droettboom [:mdroettboom]

Assignee

Comment 4

•

4 years ago

There are more missing elements than just app_build -- it just seems that that is the first one to be checked by the schema validator so it comes up first. It actually looks like there is anywhere between 1-9 elements missing from client_info.

Looking at the first 500 of these errors: https://sql.telemetry.mozilla.org/queries/72154

The missing elements always match the order in which they are set in initializeCoreMetrics -- that is, the ones that are missing are always contiguous and at the end of the order they are set in that function. This suggests to me that there is a race condition between initializeCoreMetrics and ping collection.

The Python script to confirm this observation.

import csv
import base64
import gzip
import json

rows = csv.reader(open("Glean_Validation_Errors_2020_06_22.csv"))

entries = []
row = next(rows)

idx = row.index("payload")

sets = set()

for row in rows:
    x = base64.b64decode(row[idx])
    x = gzip.decompress(x)
    x = json.loads(x)
    entries.append(x)

    sets.add(tuple(sorted(list(x["client_info"].keys()))))

order = [
    "android_sdk_version",
    "os_version",
    "device_manufacturer",
    "device_model",
    "architecture",
    "app_channel",
    "app_build",
    "app_display_version",
]

for set in sets:
    indices = sorted([order.index(x) for x in set if x in order])
    if indices != list(range(len(indices))):
        print(set)

Now to find a way to prevent the race condition.

GitHub Bugzilla PR Linker

Comment 5

•

4 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/glean/pull/996 — Details

Alessio Placitelli [:Dexter]

Reporter

Updated

•

4 years ago

Blocks: 1645133

Michael Droettboom [:mdroettboom]

Assignee

Comment 6

•

4 years ago

More info: This behavior described in the previous comment is only present in the dirty_startup ping. The foreground and background pings either have all core metrics or none.

Chris H-C :chutten

Comment 7

•

4 years ago

I cast my vote for not saving or sending a dirty_startup ping if it doesn't have all its core metrics (recording an error metric in that case so we can monitor the number of pings we're not sending as a result)

Michael Droettboom [:mdroettboom]

Assignee

Comment 8

•

4 years ago

This seems to suggest what is happening is that Fenix is stopping while initializeCoreMetrics is running. Upon the next startup, the dirty_startup ping is sent with incomplete data (it's only after this that it's filled in again).

We have a few imperfect options to resolve this:

Don't send a dirty startup ping if some of these values are missing in the database -- if the previous session was really so short, maybe the data's not interesting anyway. This, from a data analysis perspective, is the same situation we have now since the pings are being dropped at the endpoint.
Fill in the missing values with "unknown" values so they pass the schema validation and get entered into the database.
Fill in the core metrics before sending the dirty_startup ping. This wouldn't strictly be correct, because we could have updated the app, causing a number of the values to change since we last ran. But for baseline pings, does that edge case matter?

Alessio Placitelli [:Dexter]

Reporter

Comment 9

•

4 years ago

(In reply to Michael Droettboom [:mdroettboom] from comment #6)

More info: This behavior described in the previous comment is only present in the dirty_startup ping. The foreground and background pings either have all core metrics or none.

This is probably indicating a second problem: maybe Fenix had a startup crash?

Michael Droettboom [:mdroettboom]

Assignee

Comment 10

•

4 years ago

(In reply to Alessio Placitelli [:Dexter] from comment #9)

(In reply to Michael Droettboom [:mdroettboom] from comment #6)

More info: This behavior described in the previous comment is only present in the dirty_startup ping. The foreground and background pings either have all core metrics or none.

This is probably indicating a second problem: maybe Fenix had a startup crash?

Hmm... I wonder if there's any way to correlate with the Sentry data (probably not). I don't think we have any way of distinguishing a crash from a "the OS just suddenly shut us down because it can" event.

Michael Droettboom [:mdroettboom]

Assignee

Comment 11

•

4 years ago

Another option (4): we don't set the dirty bit until Glean has successfully completed initialization. That way, if we "crash" or "stop" during Glean initialization, we won't get a dirty_startup ping the next time.

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

org.everit.json.schema.ValidationException: #/client_info: required key [app_build]

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

Tracking

(Not tracked)

People

(Reporter: Dexter, Assigned: mdroettboom)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type