Closed Bug 1905958 Opened 1 year ago Closed 6 months ago

Create a JSON schema to describe the graph that is serialized for session restore

Categories

(Firefox :: Session Restore, task, P3)

task

Tracking

()

RESOLVED FIXED
140 Branch
Tracking Status
firefox140 --- fixed

People

(Reporter: sfoster, Assigned: sfoster)

References

Details

(Whiteboard: [fidefe-session-restore])

Attachments

(1 file)

In bug 1849393 we identified a need for a formal schema that defines what goes into a session file. These are an in-memory javascript graph that gets serialized to JSON and compressed and written to disk as a .jsonlz4 file, allowing a user to seamlessly resume a session after a restart or crash.

The basic structure looks something like this:

{
  "version": [
    "sessionrestore",
    1
  ],
  "windows": [
    {
      "tabs": [
        {
          "entries": [
            {
              "url": "https://example.com/",
              "title": "Page title",
              "hasUserInteraction": false,
              "triggeringPrincipal_base64": "{\"3\":{}}"
            },
            ....
          ],
          "lastAccessed": 1697149705740,
          "hidden": false,
          "searchMode": null,
          "userContextId": 0,
          "attributes": {},
          "index": 3,
          "requestedIndex": 0,
          "image": "data:image/x-icon;base64,etc.."
        },
        ...
      ],
      "_closedTabs": [
        {
          "state": {
            "entries": [
              {
                "url": "https://elsewhere.com/",
                "title": "Page title",
                "resultPrincipalURI": null,
                "principalToInherit_base64": "{\"0\":{\"0\":\"moz-nullprincipal:{b8140753-f226-4c5d-9749-fc3eea899d9f}\"}}",
                "hasUserInteraction": true,
                "triggeringPrincipal_base64": "{\"3\":{}}",
                "persist": true
              }
            ],
            "lastAccessed": 1697142677640,
            "hidden": false,
            "searchMode": null,
            "userContextId": 0,
            "attributes": {},
            "index": 1,
            "requestedIndex": 0,
            "image": "..."
          },
          "title": "Page title",
          "image": "...",
          "pos": 0,
          "closedAt": 1697147353955,
          "closedInGroup": false,
          "removeAfterRestore": true,
          "closedId": 3,
          "sourceWindowId": "window0"
        },
        ...
      ]
    },
    ...
  },
  "selectedWindow": 0,
  "_closedWindows": [
     ...
  ],

  "session": {
    "lastUpdate": 1697149716870,
    "startTime": 1697149627011,
    "recentCrashes": 1
  },
  "global": {},
  "cookies": [
     {
      "host": "example.com",
      "value": "af6e3...",
      "path": "/",
      "name": "someName",
      "secure": true,
      "httponly": true,
      "expiry": 1697142669447,
      "originAttributes": {
        "firstPartyDomain": "",
        "geckoViewSessionContextId": "",
        "inIsolatedMozBrowser": false,
        "partitionKey": "",
        "privateBrowsingId": 0,
        "userContextId": 3
      },
      "sameSite": 1,
      "schemeMap": 2
    },
    ...
 ]
}

...but there is quite a lot of detail down at the individual entries level for each tab.

Having a schema would be useful in tests and also a known quantity and jumping off point for any future optimization or re-architecting of how session (res)store works.

See Also: → 1849393

I have a work-in-progress for the JSON schema at https://github.com/sfoster/moz-sessionrestore-tools, the draft schema itself its at session-schema.json. Once that is closer to done and we have figured out where in the tree this should live, I'll get a patch on here. PRs welcome in the meantime.

Assignee: nobody → sfoster
Status: NEW → ASSIGNED

:adw it looks like maybe you or :mak might be able to answer this question. When validating an array, JsonSchemaValidator seems to assume arrays of items should all have the same type? That's not how I understand the spec though - for draft-07 at least, each item in an array can have its own schema to validate it.

A concrete example, to validate the version property of the Session restore document I have:

"version": {
  "type": "array",
  "items": [
    {"type": "string"},
    {"type": "integer"}
  ]
}

and the example input looks like:

  "version": [
    "sessionrestore",
    1
  ],

Did I misread this (the code or the spec) or do we need a patch on the validator implementation to support this?

Flags: needinfo?(adw)

Some information that might be useful in the schema, is the importance of each field. For example:

  • Opening a new tab is of high importance, and it should be synced to disk immediately.
  • But the position of a page is not that important. In fact, syncing to disk every time the user scrolls is quite wasteful. I assume there are more fields like this.

(In reply to Dimitrios Apostolou from comment #3)

Some information that might be useful in the schema, is the importance of each field. For example:

  • Opening a new tab is of high importance, and it should be synced to disk immediately.
  • But the position of a page is not that important. In fact, syncing to disk every time the user scrolls is quite wasteful. I assume there are more fields like this.

We are planning on eventually moving to a incremental write model, where the cost of a single property update will be greatly reduced. If there's still a need for tracking what kinds of changes have been made, these kind of weighting values could potentially live in the schema or just in the code. Similar heuristics elsewhere are typically just implemented in code.

Sorry for the delay, I was out until today. JsonSchemaValidator isn't quite standard and we should probably replace it with a standardized one at some point.

items can't be an array but type can, so this schema fragment shoud work for you:

  {
    type: "array",
    items: {
      type: ["string", "integer"],
    },
  },

Here's a fuller example:

JsonSchemaValidator.validate(
  {
    version: [
      "sessionrestore",
      1
    ],
  },
  {
    type: "object",
    properties: {
      version: {
        type: "array",
        items: {
          type: ["string", "integer"],
        },
      },
    },
  }
);
Flags: needinfo?(adw)

But there's no way to say "the first element must be a string and the second element must be an int," so if you need that, you'll have to do some post-validation validation on your own. In that case I would suggest not using an array at all but an object instead.

Whiteboard: [fidefe-session-restore]

(In reply to Sam Foster [:sfoster] (he/him) from comment #4)

We are planning on eventually moving to a incremental write model, where the cost of a single property update will be greatly reduced. If there's still a need for tracking what kinds of changes have been made, these kind of weighting values could potentially live in the schema or just in the code. Similar heuristics elsewhere are typically just implemented in code.

When we start going forward with this, we might want to revisit how core-session-restore integrates, since data collection from content documents are very incremental. Is there a bug for the incremental work?

(In reply to Andreas Farre [:farre] from comment #7)

When we start going forward with this, we might want to revisit how core-session-restore integrates, since data collection from content documents are very incremental. Is there a bug for the incremental work?

There's no bug yet, just the discussion in bug 1849393 in which incremental updates are identified as a solution to the problem.

I"m attaching a snapshot of the WIP of this patch: this is a xpcshell test that takes the draft schema and validates a session data file against it, passing if it validates, failing if not.

  • I'm using Validator in the JsonSchema module, rather than JsonSchemaValidator.sys.mjs. That follows $refs properly and has better exceptions.
  • The structure of the entities and the references and where all that lives is work-in-progress. We may want to run validation at runtime when some debug pref is enabled so the test directory is not the final home for the schema. Though it may be good for now in for this particular bug/patch.
  • And we likely want separate files for some of these entities which have meaning and value outside this particular use case. Like history entries, sidebar properties, cookies, etc.

So far, its loading up the right documents, validating stuff and validation fails as expedcted if you put invalid data in the session data. But its not yet flagging missing data correctly. I think there are a few cases where data can be wholly missing but if its there it should be valid. And some where it must be there (required).

Attachment #9480168 - Attachment description: WIP: Bug 1905958 - Smoke test the schema by validating a sample session restore document → Bug 1905958 - Add a JSON schema for browser session state and a way to validate against it. r?#sessionstore-reviewers
Pushed by sfoster@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/371df68a102a Add a JSON schema for browser session state and a way to validate against it. r=sessionstore-reviewers,sidebar-reviewers,nsharpley,dwalker
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 140 Branch
QA Whiteboard: [qa-triage-done-c141/b140]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: