org.everit.json.schema.ValidationException: #/client_info: required key [app_build]
Categories
(Data Platform and Tools :: Glean: SDK, defect, P1)
Tracking
(Not tracked)
People
(Reporter: Dexter, Assigned: mdroettboom)
References
Details
Attachments
(1 file)
org-mozilla-fennec-aurora org.everit.json.schema.ValidationException: #/client_info: required key [app_build] not found baseline 927
We're seeing a bunch of failures like the above across all the app ids. We should figure out why that's happening.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Maybe this has something to do with the way we are getting this info being deprecated? It says deprecated since API level 28, so I wonder if this is mostly newer devices?
Reporter | ||
Comment 2•4 years ago
|
||
(In reply to Travis Long [:travis_] from comment #1)
Maybe this has something to do with the way we are getting this info being deprecated? It says deprecated since API level 28, so I wonder if this is mostly newer devices?
Maybe, but I'd be very surprised. Deprecated doesn't mean removed :)
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Comment 4•4 years ago
|
||
There are more missing elements than just app_build
-- it just seems that that is the first one to be checked by the schema validator so it comes up first. It actually looks like there is anywhere between 1-9 elements missing from client_info
.
Looking at the first 500 of these errors: https://sql.telemetry.mozilla.org/queries/72154
The missing elements always match the order in which they are set in initializeCoreMetrics
-- that is, the ones that are missing are always contiguous and at the end of the order they are set in that function. This suggests to me that there is a race condition between initializeCoreMetrics
and ping collection.
The Python script to confirm this observation.
import csv
import base64
import gzip
import json
rows = csv.reader(open("Glean_Validation_Errors_2020_06_22.csv"))
entries = []
row = next(rows)
idx = row.index("payload")
sets = set()
for row in rows:
x = base64.b64decode(row[idx])
x = gzip.decompress(x)
x = json.loads(x)
entries.append(x)
sets.add(tuple(sorted(list(x["client_info"].keys()))))
order = [
"android_sdk_version",
"os_version",
"device_manufacturer",
"device_model",
"architecture",
"app_channel",
"app_build",
"app_display_version",
]
for set in sets:
indices = sorted([order.index(x) for x in set if x in order])
if indices != list(range(len(indices))):
print(set)
Now to find a way to prevent the race condition.
Comment 5•4 years ago
|
||
Assignee | ||
Comment 6•4 years ago
|
||
More info: This behavior described in the previous comment is only present in the dirty_startup
ping. The foreground
and background
pings either have all core metrics or none.
Comment 7•4 years ago
|
||
I cast my vote for not saving or sending a dirty_startup
ping if it doesn't have all its core metrics (recording an error metric in that case so we can monitor the number of pings we're not sending as a result)
Assignee | ||
Comment 8•4 years ago
|
||
This seems to suggest what is happening is that Fenix is stopping while initializeCoreMetrics is running. Upon the next startup, the dirty_startup ping is sent with incomplete data (it's only after this that it's filled in again).
We have a few imperfect options to resolve this:
-
Don't send a dirty startup ping if some of these values are missing in the database -- if the previous session was really so short, maybe the data's not interesting anyway. This, from a data analysis perspective, is the same situation we have now since the pings are being dropped at the endpoint.
-
Fill in the missing values with "unknown" values so they pass the schema validation and get entered into the database.
-
Fill in the core metrics before sending the
dirty_startup
ping. This wouldn't strictly be correct, because we could have updated the app, causing a number of the values to change since we last ran. But for baseline pings, does that edge case matter?
Reporter | ||
Comment 9•4 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #6)
More info: This behavior described in the previous comment is only present in the
dirty_startup
ping. Theforeground
andbackground
pings either have all core metrics or none.
This is probably indicating a second problem: maybe Fenix had a startup crash?
Assignee | ||
Comment 10•4 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #9)
(In reply to Michael Droettboom [:mdroettboom] from comment #6)
More info: This behavior described in the previous comment is only present in the
dirty_startup
ping. Theforeground
andbackground
pings either have all core metrics or none.This is probably indicating a second problem: maybe Fenix had a startup crash?
Hmm... I wonder if there's any way to correlate with the Sentry data (probably not). I don't think we have any way of distinguishing a crash from a "the OS just suddenly shut us down because it can" event.
Assignee | ||
Comment 11•4 years ago
|
||
Another option (4): we don't set the dirty bit until Glean has successfully completed initialization. That way, if we "crash" or "stop" during Glean initialization, we won't get a dirty_startup ping the next time.
Assignee | ||
Updated•4 years ago
|
Description
•