Closed Bug 1534222 Opened 6 years ago Closed 5 years ago

LSNG: Check for maximum message size before calling SendCheckpoint()

Categories

(Core :: Storage: localStorage & sessionStorage, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- unaffected
firefox67 --- disabled
firefox67.0.1 --- disabled
firefox68 --- wontfix
firefox69 --- fix-optional
firefox70 --- fix-optional

People

(Reporter: janv, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Blocks: 1517090
Crash Signature: [mozilla::dom::PBackgroundLSSnapshotChild::SendCheckpoint(nsTArray<mozilla::dom::LSWriteInfo> const&)]
Crash Signature: [mozilla::dom::PBackgroundLSSnapshotChild::SendCheckpoint(nsTArray<mozilla::dom::LSWriteInfo> const&)] → [@ mozilla::ipc::ProcessLink::SendMessage | IPC_Message_Name=PBackgroundLSSnapshot::Msg_Checkpoint]
Priority: -- → P2
Assignee: nobody → jvarga
Priority: P2 → P1
Status: NEW → ASSIGNED
No longer blocks: 1517090
Blocks: 1540402

Jan, could you please provide an estimate timeline for the patch?

Flags: needinfo?(jvarga)

(In reply to Neha Kochar [:neha] from comment #1)

Jan, could you please provide an estimate timeline for the patch?

Couple of days, likely next Monday.

Flags: needinfo?(jvarga)

There's still the 5 MB limit, but a page can cause a lot of local storage changes during execution of a JS function.
The limit is not reached in the end, but all the changes can create a huge "database log".

For example, this loop will cause the crash (max message size is 256MB):
for (var i = 0; i < 26; i++) {
randomlyChange(largeAlmost5MegaByteBuffer);
localStorage.setItem("foo", largeAlmost5MegaByteBuffer);
}

Andrew, here are options for fixing this bug, we need to decide which approach to take:

  1. Use the WriteOptimizer on the child side too
    This would decrease memory requirements I believe, but so far we always distributed all storage events across content processes. This could regress some web sites.

  2. Use data compression
    This would also decrease memory requirements, but it's not trivial to implement and we need to watch for max message size anyway.

  3. Support sending partial checkpoints
    The child would have to create a new separate write info array, once the size of existing array is close to max message size.
    Then when the stable state callback is called, we call SendCheckpoint multiple times and the last one would signalize that it's the last message. Parent would have allocate separate arrays too and apply changes only when everything has been received.
    The downside is that this can temporarily cause huge memory allocations if a page does something bad intentionally.
    The upside is that snapshots would be still consistent as we originally planned.

  4. Send checkpoint when the size of write info array is close to max message size
    We would just track the size of current write info array and call SendCheckpoint when the size is close to max message size, we wouldn't wait for the stable state callback.
    The downside is that while the existing snapshot would still see the original state, any new snapshots created in other content processes would see "partial" changes from the original content process.
    The upside is that this doesn't need a lot of memory.

If you know about any other options, let me know.

Flags: needinfo?(bugmail)

(LSNG isn't shipping in 67 so updating the tracking flags accordingly)

We still need to check the max message size, but fixing bug 1546310 and bug 1513937 may mitigate this issue.

I think using the WriteOptimizer in the client is the right way to go.

Flags: needinfo?(bugmail)

Are we still planning to land a max-message-size test in 68?

Flags: needinfo?(jvarga)

No, we're going to implement a WriteOptimizer that will coalesce operations, so final IPC message would never reach the IPC maximum message size. That should happen right after bug 1540777.

Flags: needinfo?(jvarga)

We were looking at this in Regression Triage today; are we still planning on fixing this in 68 (as it's a P1)? Thanks!

Flags: needinfo?(jvarga)

Yes.

Flags: needinfo?(jvarga)

Andrew, I'm afraid that we can't use write optimizer in the client, take a look at this test:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/webstorage/event_local_removeitem.html

Flags: needinfo?(bugmail)

As per the discussion in a vidyo call, I think only do the write optimizer for IPC purposes is a workable simplification. Behavior within the same process would be the same, so that test would still pass. And as we start doing origin-per-process, it's less likely that a site would perceive dubious behavior.

Alternately, we could allow some level of redundancy... but I think it's clear that if an origin is generating 256MB of deltas that it's appropriate to engage in some type of https://github.com/WICG/interventions.

Flags: needinfo?(bugmail)

Yeah, but it will take some time to have origin-per-process. Andrew also mentioned that parent could tell the child if there are cross-process listeners at the time the snapshot is created. If there are no listeners we can safely use the write optimizer on the child side too. I think that would be a good compromise since the chance of having two or more tabs open for the same origin and having a cross-process listener for the storage event and having a function that produces a lot of changes is not very high.

No longer blocks: 1539835
Priority: P1 → P2
Assignee: jvarga → nobody
Status: ASSIGNED → NEW

Too late for a fix in 68 but we could still likely take a patch for 69.

Dropping the issue from regression triage at this point; it has a priority set and the team is aware of the issue.

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.