Create a JSON schema to describe the graph that is serialized for session restore
Categories
(Firefox :: Session Restore, task, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox140 | --- | fixed |
People
(Reporter: sfoster, Assigned: sfoster)
References
Details
(Whiteboard: [fidefe-session-restore])
Attachments
(1 file)
In bug 1849393 we identified a need for a formal schema that defines what goes into a session file. These are an in-memory javascript graph that gets serialized to JSON and compressed and written to disk as a .jsonlz4 file, allowing a user to seamlessly resume a session after a restart or crash.
The basic structure looks something like this:
{
"version": [
"sessionrestore",
1
],
"windows": [
{
"tabs": [
{
"entries": [
{
"url": "https://example.com/",
"title": "Page title",
"hasUserInteraction": false,
"triggeringPrincipal_base64": "{\"3\":{}}"
},
....
],
"lastAccessed": 1697149705740,
"hidden": false,
"searchMode": null,
"userContextId": 0,
"attributes": {},
"index": 3,
"requestedIndex": 0,
"image": ".."
},
...
],
"_closedTabs": [
{
"state": {
"entries": [
{
"url": "https://elsewhere.com/",
"title": "Page title",
"resultPrincipalURI": null,
"principalToInherit_base64": "{\"0\":{\"0\":\"moz-nullprincipal:{b8140753-f226-4c5d-9749-fc3eea899d9f}\"}}",
"hasUserInteraction": true,
"triggeringPrincipal_base64": "{\"3\":{}}",
"persist": true
}
],
"lastAccessed": 1697142677640,
"hidden": false,
"searchMode": null,
"userContextId": 0,
"attributes": {},
"index": 1,
"requestedIndex": 0,
"image": "..."
},
"title": "Page title",
"image": "...",
"pos": 0,
"closedAt": 1697147353955,
"closedInGroup": false,
"removeAfterRestore": true,
"closedId": 3,
"sourceWindowId": "window0"
},
...
]
},
...
},
"selectedWindow": 0,
"_closedWindows": [
...
],
"session": {
"lastUpdate": 1697149716870,
"startTime": 1697149627011,
"recentCrashes": 1
},
"global": {},
"cookies": [
{
"host": "example.com",
"value": "af6e3...",
"path": "/",
"name": "someName",
"secure": true,
"httponly": true,
"expiry": 1697142669447,
"originAttributes": {
"firstPartyDomain": "",
"geckoViewSessionContextId": "",
"inIsolatedMozBrowser": false,
"partitionKey": "",
"privateBrowsingId": 0,
"userContextId": 3
},
"sameSite": 1,
"schemeMap": 2
},
...
]
}
...but there is quite a lot of detail down at the individual entries level for each tab.
Having a schema would be useful in tests and also a known quantity and jumping off point for any future optimization or re-architecting of how session (res)store works.
| Assignee | ||
Comment 1•1 year ago
•
|
||
I have a work-in-progress for the JSON schema at https://github.com/sfoster/moz-sessionrestore-tools, the draft schema itself its at session-schema.json. Once that is closer to done and we have figured out where in the tree this should live, I'll get a patch on here. PRs welcome in the meantime.
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 2•10 months ago
|
||
:adw it looks like maybe you or :mak might be able to answer this question. When validating an array, JsonSchemaValidator seems to assume arrays of items should all have the same type? That's not how I understand the spec though - for draft-07 at least, each item in an array can have its own schema to validate it.
A concrete example, to validate the version property of the Session restore document I have:
"version": {
"type": "array",
"items": [
{"type": "string"},
{"type": "integer"}
]
}
and the example input looks like:
"version": [
"sessionrestore",
1
],
Did I misread this (the code or the spec) or do we need a patch on the validator implementation to support this?
Comment 3•10 months ago
|
||
Some information that might be useful in the schema, is the importance of each field. For example:
- Opening a new tab is of high importance, and it should be synced to disk immediately.
- But the position of a page is not that important. In fact, syncing to disk every time the user scrolls is quite wasteful. I assume there are more fields like this.
| Assignee | ||
Comment 4•10 months ago
|
||
(In reply to Dimitrios Apostolou from comment #3)
Some information that might be useful in the schema, is the importance of each field. For example:
- Opening a new tab is of high importance, and it should be synced to disk immediately.
- But the position of a page is not that important. In fact, syncing to disk every time the user scrolls is quite wasteful. I assume there are more fields like this.
We are planning on eventually moving to a incremental write model, where the cost of a single property update will be greatly reduced. If there's still a need for tracking what kinds of changes have been made, these kind of weighting values could potentially live in the schema or just in the code. Similar heuristics elsewhere are typically just implemented in code.
Comment 5•10 months ago
|
||
Sorry for the delay, I was out until today. JsonSchemaValidator isn't quite standard and we should probably replace it with a standardized one at some point.
items can't be an array but type can, so this schema fragment shoud work for you:
{
type: "array",
items: {
type: ["string", "integer"],
},
},
Here's a fuller example:
JsonSchemaValidator.validate(
{
version: [
"sessionrestore",
1
],
},
{
type: "object",
properties: {
version: {
type: "array",
items: {
type: ["string", "integer"],
},
},
},
}
);
Comment 6•10 months ago
|
||
But there's no way to say "the first element must be a string and the second element must be an int," so if you need that, you'll have to do some post-validation validation on your own. In that case I would suggest not using an array at all but an object instead.
Updated•10 months ago
|
Updated•10 months ago
|
Comment 7•7 months ago
|
||
(In reply to Sam Foster [:sfoster] (he/him) from comment #4)
We are planning on eventually moving to a incremental write model, where the cost of a single property update will be greatly reduced. If there's still a need for tracking what kinds of changes have been made, these kind of weighting values could potentially live in the schema or just in the code. Similar heuristics elsewhere are typically just implemented in code.
When we start going forward with this, we might want to revisit how core-session-restore integrates, since data collection from content documents are very incremental. Is there a bug for the incremental work?
| Assignee | ||
Comment 8•7 months ago
|
||
(In reply to Andreas Farre [:farre] from comment #7)
When we start going forward with this, we might want to revisit how core-session-restore integrates, since data collection from content documents are very incremental. Is there a bug for the incremental work?
There's no bug yet, just the discussion in bug 1849393 in which incremental updates are identified as a solution to the problem.
| Assignee | ||
Comment 9•6 months ago
|
||
| Assignee | ||
Comment 10•6 months ago
|
||
I"m attaching a snapshot of the WIP of this patch: this is a xpcshell test that takes the draft schema and validates a session data file against it, passing if it validates, failing if not.
- I'm using
Validatorin theJsonSchemamodule, rather thanJsonSchemaValidator.sys.mjs. That follows$refs properly and has better exceptions. - The structure of the entities and the references and where all that lives is work-in-progress. We may want to run validation at runtime when some debug pref is enabled so the test directory is not the final home for the schema. Though it may be good for now in for this particular bug/patch.
- And we likely want separate files for some of these entities which have meaning and value outside this particular use case. Like history entries, sidebar properties, cookies, etc.
So far, its loading up the right documents, validating stuff and validation fails as expedcted if you put invalid data in the session data. But its not yet flagging missing data correctly. I think there are a few cases where data can be wholly missing but if its there it should be valid. And some where it must be there (required).
Updated•6 months ago
|
Comment 11•6 months ago
|
||
Comment 12•6 months ago
|
||
| bugherder | ||
Updated•5 months ago
|
Description
•